Stop, Collaborate, and Listen: An ERGM Analysis of School Leader Networks
PREPARE
This study was completed as part of the summer 2021 Social Network Analysis in Education course (ECI 589) at North Carolina State University. The course is the last of the series of courses offered as part of the Learning Analytics Certificate Program.
The Rmarkdown file can be downloaded from the code folding control button in the upper right hand of this page. The full R Studio project can be access via this Github repository. The text for this course is Social Network Analysis and Education: Theory, Methods & Applications from Carolan (2014). The data for this project can be obtained from this text’s companion site.
Context
This independent analysis uses data from Daly’s Network of School Leaders presented in Chapter 3 of Carolan (2014). These data were collected at two school districts over 3 consecutive years and include:
- individual demographics (e.g., gender, ethnicity, etc)
- network relationships (e.g., collaboration, confidential, etc)
- frequency of interactions with others on a four-point scale ranging from 1 (the least frequent) to 4 (1–2 times a week)
- 18 efficacy items based on the Principal Efficacy Scale used in the Daly et al. (2011) and Tschannen-Moran and Gareis’s (2004) studies and rated on a 9-point Likert scale ranging from 1 (None at all) to 9 (A great deal)
- 8 trust items rated on a 7-point Likert scale ranging from 1 (Strongly disagree) to 7 (Strongly agree) modified from Tschannen-Moran and Hoy (2003).
Guiding Questions and Methods
This study focuses on the confidential exchanges between school leaders in year 1 and is framed by the following questions:
Does gender or some other individual attribute predict confidential exchanges between school leaders?
Do school leaders prefer to confide in others at the same level of leadership (i.e., school or district)?
The analysis employs Exponential-family Random Graph Models (ERGMs) to answer these questions.
R Packages
The following R packages were used for the analyses carried out in this study.
library(readxl)
library(here)
library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)
library(deldir)
library(statnet)Import the Data
The data were imported from the Carolan (2014) text’s Chapter 9 Excel files provided on the text companion site. There are two Excel files:
- School_Leaders_Data_Chapter_9_e.xlsx provides the individual demographic data for the 43 school leaders in the network.
- School_Leaders_Data_Chapter_9_c.xlsx provides the adjacency matrix reports on “confidential exchange” ties of these 43 school leaders in year 1 of the study.
leader_nodes <- read_excel(here("Presentation",
"data",
"School_Leaders_Data_Chapter_9_e.xlsx"),
col_types = c("text", "numeric",
"numeric", "numeric", "numeric"))
leader_matrix <- read_excel(here("Presentation",
"data","School_Leaders_Data_Chapter_9_c.xlsx"),
col_names = FALSE)WRANGLE
I’ve got all my libraries loaded and the data has been imported. So, let’s go wrangle some data…
Dichotomize the Matrix
Dichotomizing is recoding edge values to 1s and 0s so that the valued matrix is a binary matrix. To accomplish this, I first create the matrix object.
leader_matrix <- leader_matrix %>%
as.matrix()Then I dichotomize it by setting less frequent ties to 0 and more frequent ties to 1.
leader_matrix[leader_matrix <= 2] <- 0
leader_matrix[leader_matrix >= 3] <- 1Next, I add the node names as row and column names for the purpose of generating an edge-list for this network.
rownames(leader_matrix) <- leader_nodes$ID
colnames(leader_matrix) <- leader_nodes$IDCreate Network Graph
In order to extract the edge list, I use the matrix to create an igraph graph object.
adjacency_matrix <- graph.adjacency(leader_matrix,
diag = FALSE)
class(adjacency_matrix)## [1] "igraph"
Edge list
All this set up allows me to extract the edge-list from the igraph graph object. The result is a set of 25 ties between the school leaders.
leader_edges <- get.data.frame(adjacency_matrix)
kable(leader_edges)| from | to |
|---|---|
| 7 | 33 |
| 9 | 5 |
| 9 | 17 |
| 10 | 8 |
| 11 | 40 |
| 14 | 42 |
| 19 | 37 |
| 20 | 10 |
| 20 | 23 |
| 21 | 40 |
| 22 | 29 |
| 24 | 27 |
| 24 | 34 |
| 27 | 24 |
| 27 | 34 |
| 28 | 18 |
| 28 | 37 |
| 32 | 24 |
| 35 | 11 |
| 35 | 21 |
| 36 | 29 |
| 38 | 37 |
| 39 | 19 |
| 40 | 37 |
| 42 | 8 |
Network Graph
With an edge list, I can now create a network graph object using my edges and nodes. This is shown in the following code chunk.
leader_graph <- tbl_graph(edges = leader_edges,
nodes = leader_nodes,
directed = TRUE)
leader_graph## # A tbl_graph: 43 nodes and 25 edges
## #
## # A directed simple graph with 21 components
## #
## # Node Data: 43 x 5 (active)
## ID EFFICACY TRUST `DISTRICT/SITE` MALE
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1 6.06 4 1 0
## 2 2 6.56 5.63 1 0
## 3 3 7.39 4.63 1 0
## 4 4 4.89 4 1 0
## 5 5 6.06 5.75 0 1
## 6 6 7.39 4.38 0 0
## # … with 37 more rows
## #
## # Edge Data: 25 x 2
## from to
## <int> <int>
## 1 7 33
## 2 9 5
## 3 9 17
## # … with 22 more rows
As you can see, the result is a directed graph with 43 nodes (the school leaders) and 25 edges. Since there are fewer edges than nodes, it should be safe to assume that some school leaders either did not engage in confidential exchanges in year 1 or, if they did, they did so very infrequently.
EXPLORE
The following sections will provide some basic summary statistics of these data and a sociogram for visualizing the network.
Node degrees
Before getting the summary statisitics, I calculate the in_degree and out_degree of the nodes with the following code.
## # A tbl_graph: 43 nodes and 25 edges
## #
## # A directed simple graph with 21 components
## #
## # Node Data: 43 x 7 (active)
## ID EFFICACY TRUST `DISTRICT/SITE` MALE in_degree out_degree
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 6.06 4 1 0 0 0
## 2 2 6.56 5.63 1 0 0 0
## 3 3 7.39 4.63 1 0 0 0
## 4 4 4.89 4 1 0 0 0
## 5 5 6.06 5.75 0 1 1 0
## 6 6 7.39 4.38 0 0 0 0
## # … with 37 more rows
## #
## # Edge Data: 25 x 2
## from to
## <int> <int>
## 1 7 33
## 2 9 5
## 3 9 17
## # … with 22 more rows
Node Measures
With the in_degree and out_degree added to the network’s measures, I can get the summary of these measures.
## ID EFFICACY TRUST DISTRICT.SITE
## Length:43 Min. :4.610 Min. :3.630 Min. :0.0000
## Class :character 1st Qu.:5.670 1st Qu.:4.130 1st Qu.:0.0000
## Mode :character Median :6.780 Median :4.780 Median :0.0000
## Mean :6.649 Mean :4.783 Mean :0.4186
## 3rd Qu.:7.470 3rd Qu.:5.440 3rd Qu.:1.0000
## Max. :8.500 Max. :5.880 Max. :1.0000
## MALE in_degree out_degree
## Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.4419 Mean :0.5814 Mean :0.5814
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :4.0000 Max. :2.0000
Breaking it Down: School and District Level Descriptives
The following sections provide descriptives statistics for the in degree, out degree, trust, and efficacy of the network delineated by level of administration (i.e., school or district level).
In Degree
| DISTRICT.SITE | n | mean | sd |
|---|---|---|---|
| 0 | 25 | 0.4400000 | 0.5830952 |
| 1 | 18 | 0.7777778 | 1.1659662 |
Out Degree
| DISTRICT.SITE | n | mean | sd |
|---|---|---|---|
| 0 | 25 | 0.6000000 | 0.7071068 |
| 1 | 18 | 0.5555556 | 0.7838234 |
Trust
| DISTRICT.SITE | n | mean | sd |
|---|---|---|---|
| 0 | 25 | 4.9044 | 0.7191666 |
| 1 | 18 | 4.6150 | 0.6880514 |
Efficacy
| DISTRICT.SITE | n | mean | sd |
|---|---|---|---|
| 0 | 25 | 7.022000 | 0.9534149 |
| 1 | 18 | 6.131667 | 1.1151906 |
Comparison to Carolan (2014)
Examining these descriptive statistics, it appears I have successfully replicated the results shown in Table 9.1 of Carolan (2014) for school leaders confidential exchanges in year 1. Table 9.1 is shown below.
Visualize the Network
As you see below, visualizing the network for year 1 looks quite a bit different from the year 3 network. There are far fewer confidential exchanges in year 1. Additionally, coloring the nodes by gender reveals that the groupings of confidential exchanges don’t appear to consist of a single gender. Moreover, the node shapes (square for district-level and circle for school-level leader) appear to indicate that the groupings consist of either a majority of district-level leaders or school-level leaders. This suggests that administration level may be important in year 1 confidential networks. I’ll take this into consideration as I use ERGM to model the network data.
ggraph(leader_measures, layout = "fr") +
geom_node_point(aes(size = out_degree,
colour = factor(MALE),
fill = factor(MALE)),
shape = node_measures$DISTRICT.SITE + 21, show.legend = TRUE) +
geom_node_text(aes(label = ID,
size = 1.5),
repel=TRUE, show.legend = FALSE) +
geom_edge_link(arrow = arrow(length = unit(1, 'mm')),
end_cap = circle(2, 'mm'),
alpha = .3) +
theme_graph()MODEL
Setting up to perform an ERGM, I create a network object with these edges and nodes. The result is a directed network with 43 nodes (school leaders) and 25 edges.
## Network attributes:
## vertices = 43
## directed = TRUE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges= 25
## missing edges= 0
## non-missing edges= 25
##
## Vertex attribute names:
## DISTRICT/SITE EFFICACY MALE TRUST vertex.names
##
## No edge attributes
ERGM
To address my guiding questions, I ran the ERGM with all the individual attributes as well as the geometrically-weighed edgewise shared partner (GWESP) term for transitivity.
## Call:
## ergm(formula = leader_network ~ edges + mutual + gwesp(0.25,
## fixed = T) + nodefactor("MALE") + nodecov("EFFICACY") + nodecov("TRUST") +
## nodematch("DISTRICT/SITE"))
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -6.4576 2.1573 0 -2.993 0.00276 **
## mutual 0.9490 1.1461 0 0.828 0.40763
## gwesp.fixed.0.25 1.1949 0.4299 0 2.779 0.00544 **
## nodefactor.MALE.1 0.3721 0.2629 0 1.415 0.15706
## nodecov.EFFICACY -0.1263 0.1220 0 -1.035 0.30046
## nodecov.TRUST 0.2381 0.1812 0 1.314 0.18877
## nodematch.DISTRICT/SITE 1.5357 0.5310 0 2.892 0.00383 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 2503.6 on 1806 degrees of freedom
## Residual Deviance: 241.8 on 1799 degrees of freedom
##
## AIC: 255.8 BIC: 294.3 (Smaller is better. MC Std. Err. = 0.04752)
While gender, efficacy, and trust do not appear to be significant predictors of confidential exchanges in year 1, the GWESP and the level of administration terms are significant is this model.
Model Fit
Let’s just take a look at the fit of this model using the gof() function and running some mcmc.diagnostics().
Goodness of Fit
Overall, the results of the gof() function show that this model fits reasonably well. That is, with the exception of the mutual term which isn’t surprising given the p-value.
my_ergm_gof <- gof(my_ergm)
plot(my_ergm_gof)MCMC Diagnostics
Next, I examine the Monte-Carlo Markov Chain (MCMC) diagnostics for this model.
mcmc.diagnostics(my_ergm)## Sample statistics summary:
##
## Iterations = 472064:8388608
## Thinning interval = 1024
## Number of chains = 1
## Sample size per chain = 7732
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## edges 1.5210 7.407 0.08424 0.21280
## mutual 0.3341 1.710 0.01945 0.05852
## gwesp.fixed.0.25 1.2339 5.297 0.06024 0.20170
## nodefactor.MALE.1 1.8862 10.480 0.11918 0.32189
## nodecov.EFFICACY 20.1004 96.883 1.10180 2.70287
## nodecov.TRUST 15.4958 75.448 0.85802 2.24566
## nodematch.DISTRICT/SITE 1.3991 7.057 0.08025 0.20862
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## edges -10.00 -4.00 1.00 5.00 19.00
## mutual -1.00 -1.00 0.00 1.00 5.00
## gwesp.fixed.0.25 -2.00 -2.00 -1.00 2.00 17.88
## nodefactor.MALE.1 -14.00 -5.00 0.00 7.00 29.00
## nodecov.EFFICACY -131.70 -47.09 8.07 71.95 254.90
## nodecov.TRUST -99.53 -36.62 5.14 53.83 201.37
## nodematch.DISTRICT/SITE -9.00 -3.00 0.00 5.00 19.00
##
##
## Are sample statistics significantly different from observed?
## edges mutual gwesp.fixed.0.25 nodefactor.MALE.1
## diff. 1.520952e+00 3.340662e-01 1.233867e+00 1.886187e+00
## test stat. 7.147335e+00 5.708596e+00 6.117478e+00 5.859790e+00
## P-val. 8.847909e-13 1.139116e-08 9.506787e-10 4.634516e-09
## nodecov.EFFICACY nodecov.TRUST nodematch.DISTRICT/SITE
## diff. 2.010037e+01 1.549576e+01 1.399121e+00
## test stat. 7.436689e+00 6.900326e+00 6.706476e+00
## P-val. 1.032403e-13 5.188316e-12 1.993809e-11
## Overall (Chi^2)
## diff. NA
## test stat. 1.604129e+02
## P-val. 1.037157e-30
##
## Sample statistics cross-correlations:
## edges mutual gwesp.fixed.0.25 nodefactor.MALE.1
## edges 1.0000000 0.7055074 0.7546630 0.9188177
## mutual 0.7055074 1.0000000 0.8517864 0.7327079
## gwesp.fixed.0.25 0.7546630 0.8517864 1.0000000 0.7927996
## nodefactor.MALE.1 0.9188177 0.7327079 0.7927996 1.0000000
## nodecov.EFFICACY 0.9960890 0.6944590 0.7421291 0.9117900
## nodecov.TRUST 0.9966234 0.7218665 0.7735616 0.9221310
## nodematch.DISTRICT/SITE 0.9601530 0.7273599 0.7744386 0.8945465
## nodecov.EFFICACY nodecov.TRUST nodematch.DISTRICT/SITE
## edges 0.9960890 0.9966234 0.9601530
## mutual 0.6944590 0.7218665 0.7273599
## gwesp.fixed.0.25 0.7421291 0.7735616 0.7744386
## nodefactor.MALE.1 0.9117900 0.9221310 0.8945465
## nodecov.EFFICACY 1.0000000 0.9936731 0.9570709
## nodecov.TRUST 0.9936731 1.0000000 0.9600324
## nodematch.DISTRICT/SITE 0.9570709 0.9600324 1.0000000
##
## Sample statistics auto-correlation:
## Chain 1
## edges mutual gwesp.fixed.0.25 nodefactor.MALE.1
## Lag 0 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1024 0.3964813 0.6385957 0.7739151 0.4775483
## Lag 2048 0.3222917 0.5407671 0.6611434 0.4038687
## Lag 3072 0.2788877 0.4588149 0.5608703 0.3473963
## Lag 4096 0.2375864 0.3892603 0.4783085 0.2989556
## Lag 5120 0.1872085 0.3287127 0.4041185 0.2441561
## nodecov.EFFICACY nodecov.TRUST nodematch.DISTRICT/SITE
## Lag 0 1.0000000 1.0000000 1.0000000
## Lag 1024 0.3785797 0.4246852 0.4250971
## Lag 2048 0.3076251 0.3467390 0.3485443
## Lag 3072 0.2643215 0.2978761 0.2901923
## Lag 4096 0.2243780 0.2554762 0.2549752
## Lag 5120 0.1752310 0.2044617 0.2017691
##
## Sample statistics burn-in diagnostic (Geweke):
## Chain 1
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## edges mutual gwesp.fixed.0.25
## -0.6193 -0.2562 -0.2602
## nodefactor.MALE.1 nodecov.EFFICACY nodecov.TRUST
## -0.3573 -0.6116 -0.6141
## nodematch.DISTRICT/SITE
## -0.5583
##
## Individual P-values (lower = worse):
## edges mutual gwesp.fixed.0.25
## 0.5356962 0.7978305 0.7946715
## nodefactor.MALE.1 nodecov.EFFICACY nodecov.TRUST
## 0.7208672 0.5408310 0.5391571
## nodematch.DISTRICT/SITE
## 0.5766220
## Joint P-value (lower = worse): 0.7492824 .
##
## MCMC diagnostics shown here are from the last round of simulation, prior to computation of final parameter estimates. Because the final estimates are refinements of those used for this simulation run, these diagnostics may understate model performance. To directly assess the performance of the final model on in-model statistics, please use the GOF command: gof(ergmFitObject, GOF=~model).
Overall, the diagnostics are a little difficult to interpret but also appear to indicate that the model is a reasonable fit.
COMMUNICATE
This study was framed by the following questions:
Does gender or some other individual attribute predict confidential exchanges between school leaders?
Do school leaders prefer to confide in others at the same level of leadership (i.e., school or district)?
Answering question 1 based on the results in the previous sections, it would appear that the only individual attribute that predicts confidential exchanges in year 1 is whether a school leader is at the school or district level. Moreover, confidential exchanges also appear to be significantly impacted by transitivity represented by the geometrically-weighed edgewise shared partner term (i.e., “friend of a friend” phenomenon). These results suggest that, at least for year 1, school leaders appear to prefer to confide in others at the same level of administration.
GET THE PROJECT CODE
If you’d like to learn more about the techniques used in this analysis (including the Rmarkdown techniques used in the write up), then you can get the R project from my Github repo:
THE END OF AN ERA
This is the bittersweet end of my coursework in the Learning Analytics Certificate Program at NC State (and ultimately, my doctoral program coursework as well). I have enjoyed these courses and the learning community I’ve gained in them, but am very much looking forward to completely my dissertation and graduating.